In [1]:
%%HTML
<style>
.container { width:100% }
</style>


Utilities

The global variable Cache is used as a cache for the function valuedefined later.


In [2]:
Cache = {}

Alpha-Beta Pruning with Depth Limited Search and Move Ordering

In order to have some variation in our game, we use random numbers to choose between optimal moves.


In [3]:
import random
random.seed(2)

Given a player p, the function other(p) computes the opponent of p. This assumes that there are only two players and the set of all players is stored in the global variable Players.


In [4]:
def other(p):
    return [o for o in Players if o != p][0]

The module heapq implements heaps. We use these as priority queues.


In [5]:
import heapq

The function value takes six arguments:

  • State is the current state of the game,
  • player is a player,
  • limit determines the lookahead. To be more precise, it is the number of half-moves that are investigated to compute the value. If 'limit' is 0, the game is evaluated using heuristic.
  • heuristic is a function that takes a state and estimates the value of the state.
  • alpha is the worst result that can happen to player.
  • beta is the best result that can happen to player.

The function value returns the value that the given State has for player if both players play optimal game. This values is an element from the set $\{-1, 0, 1\}$.

  • If player can force a win, the return value is 1.
  • If player can at best force a draw, the return value is 0.
  • If player might loose even when playing optimal, the return value is -1.

For reasons of efficiency, this function is memoized.


In [6]:
def value(State, player, limit, heuristic, alpha=-1, beta=1):
    global Cache
    if (State, limit) in Cache:
        val, a, b = Cache[(State, limit)]
        if a <= alpha and beta <= b:
            return val
        else:
            alp = min(alpha, a)
            bet = max(beta , b)
            val = alphaBeta(State, player, limit, heuristic, alp, bet)
            Cache[(State, limit)] = val, alp, bet
            return val
    else:
        val = alphaBeta(State, player, limit, heuristic, alpha, beta)
        Cache[(State, limit)] = val, alpha, beta
        return val

The function value_cache returns the value stored in the Cache.


In [7]:
def value_cache(State, limit):
    val, _, _ = Cache.get((State, limit), (0, -1, 1))
    return val

In [8]:
def alphaBeta(State, player, limit, heuristic, alpha=-1, beta=1):
    if finished(State):
        return utility(State, player)
    if limit == 0:
        return heuristic(State, player)
    val        = alpha
    NextStates = next_states(State, player)
    Moves      = []  # empty priority queue
    for ns in NextStates:
        # no '-' in front of value as smallest value has highest priority
        # limit-3 is the value from the previous iteration
        heapq.heappush(Moves, (value_cache(ns, limit-3), ns))
    while Moves != []:
        _, ns = heapq.heappop(Moves)
        val = max(val, -value(ns, other(player), limit-1, heuristic, -beta, -alpha))
        if val >= beta:
            return val
        alpha = max(val, alpha)
    return val

The function best_move takes two arguments:

  • State is the current state of the game,
  • player is a player.

The function best_move returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state. The state $s$ is a state that is reached from State if player makes one of her optimal moves. In order to have some variation in the game, the function randomly chooses any of the optimal moves.


In [9]:
def best_move(State, player, limit, heuristic):
    NS        = next_states(State, player)
    bestVal   = value(State, player, limit, heuristic)
    BestMoves = [s for s in NS if -value(s, other(player), limit-1, heuristic) == bestVal]
    BestState = random.choice(BestMoves)
    return bestVal, BestState

The next line is needed because we need the function IPython.display.clear_output to clear the output in a cell.


In [10]:
import IPython.display

In [11]:
import time

The function play_game plays on the given canvas. The game played is specified indirectly by specifying the following:

  • Start is a global variable defining the start state of the game.
  • next_states is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.
  • finished is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.
  • utility is a function such that $\texttt{utility}(s, p)$ returns either -1, 0, or 1 in the terminal state $s$. We have that
    • $\texttt{utility}(s, p)= -1$ iff the game is lost for player $p$ in state $s$,
    • $\texttt{utility}(s, p)= 0$ iff the game is drawn, and
    • $\texttt{utility}(s, p)= 1$ iff the game is won for player $p$ in state $s$.

In [12]:
def play_game(canvas, limit, heuristic):
    global Cache
    Cache   = {}
    State   = Start
    History = []
    while (True):
        firstPlayer = Players[0]
        start       = time.time()
        val, State  = best_move(State, firstPlayer, limit, heuristic)
        stop        = time.time()
        diff        = round(stop - start, 2)
        History.append(diff)
        draw(State, canvas, f'{round(diff, 2)} seconds, value = {round(val, 2)}.')
        if finished(State):
            final_msg(State)
            break
        IPython.display.clear_output(wait=True)
        State = get_move(State)
        draw(State, canvas, '')
        if finished(State):
            IPython.display.clear_output(wait=True)
            final_msg(State)
            break
    for i, d in enumerate(History):
        print(f'{i}: {d} seconds')

Play Tic-Tac-Toe


In [ ]:
%run Tic-Tac-Toe.ipynb

With memoization, computing the value of the start state takes 95 ms. Without memoization, it takes 5 seconds.


In [ ]:
%%time
val = value(Start, 'X', 6, heuristic)

We check how many different states are stored in the Cache. Without alpha-beta pruning, we had to inspect 5478 different states.


In [ ]:
len(Cache)

Let's draw the board.


In [ ]:
canvas = create_canvas(Start)
draw(Start, canvas, f'Current value of game for "X": {val}')

Now its time to play. In the input window that will pop up later, enter your move in the format "row,col" with no space between row and column.


In [ ]:
play_game(canvas, 6, heuristic)

Play Connect Four


In [13]:
%run Connect-Four.ipynb



In [14]:
%%time
val = value(Start, 'X', 10, heuristic)


CPU times: user 1min 9s, sys: 174 ms, total: 1min 9s
Wall time: 1min 9s

In [15]:
len(Cache)


Out[15]:
203043

In [16]:
canvas = create_canvas(Start)
draw(Start, canvas, f'Current value of game for "X": {round(val, 2)}')



In [17]:
play_game(canvas, 9, heuristic)


Enter column here: 1
You have lost!
0: 44.71 seconds
1: 16.64 seconds
2: 14.24 seconds
3: 7.86 seconds
4: 15.74 seconds
5: 10.31 seconds
6: 10.6 seconds
7: 6.89 seconds
8: 1.8 seconds
9: 1.18 seconds
10: 1.24 seconds
11: 0.4 seconds
12: 0.33 seconds
13: 0.29 seconds
14: 0.05 seconds
15: 0.04 seconds
16: 0.0 seconds
17: 0.0 seconds
18: 0.0 seconds
19: 0.0 seconds
20: 0.0 seconds

In [ ]:
len(Cache)

In [ ]: